Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning
نویسندگان
چکیده
منابع مشابه
Balancing Two-Player Stochastic Games with Soft Q-Learning
Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning...
متن کاملQ-learning in Two-Player Two-Action Games
Q-learning is a simple, powerful algorithm for behavior learning. It was derived in the context of single agent decision making in Markov decision process environments, but its applicability is much broader— in experiments in multiagent environments, Q-learning has also performed well. Our preliminary analysis finds that Q-learning’s indirect control of behavior via estimates of value contribut...
متن کاملMulti-agent discrete-time graphical games and reinforcement learning solutions
This paper introduces a new class of multi-agent discrete-time dynamic games, known in the literature as dynamic graphical games. For that reason a local performance index is defined for each agent that depends only on the local information available to each agent. Nash equilibrium policies and best-response policies are given in terms of the solutions to the discrete-time coupled Hamilton–Jaco...
متن کاملTwo Player Non Zero-sum Stopping Games in Discrete Time
We prove that every two player non zero-sum stopping game in discrete time admits an -equilibrium in randomized strategies, for every > 0. We use a stochastic variation of Ramsey Theorem, which enables us to reduce the problem to that of studying properties of -equilibria in a simple class of stochastic games with finite state space.
متن کاملNon-Stationary Policy Learning in 2-Player Zero Sum Games
A key challenge in multiagent environments is the construction of agents that are able to learn while acting in the presence of other agents that are simultaneously learning and adapting. These domains require on-line learning methods without the benefit of repeated training examples, as well as the ability to adapt to the evolving behavior of other agents in the environment. The difficulty is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2019
ISSN: 2169-3536
DOI: 10.1109/access.2019.2939384